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Abstract — In the field of biometrics research and industry, it 
was critical yet a challenge to match infrared face images, 
optical face images to sketches. The most challenging problem in 
heterogeneous face recognition is that face images associated 
with the same person may be different because they are taken 
with different imaging devices. Here the image modalities mean 
optical image, infrared image and sketch photos, which is 
referred to as modality gap. The major complexity lies in the 
fact that a great incongruity exists between the infrared face 
images, corresponding optical face image because they are 
captured by different imaging devices. In this paper we aim on 
the approach which defines cross modality face reorganization 
problems such as sketch-photo and high-low resolution face 
matching. In this method, new learning-based face descriptor 
was first proposed to extract the common features and an 
effective matching method is then applied to the resulting 
features to obtain the final result, our method can be used in law 
enforcement. 

Index Terms — face recognition, optical image, sketch photo, 


I. INTRODUCTION 

Due to the increasing demands in such application areas as 
law enforcement, video surveillance, banking, and security 
system access authentication, automatic face recognition had 
attracted great concentration in recent years. The advantages 
of facial identification over alternative methods, such as 
fingerprint identification, are based primarily on the fact that 
face recognition did not necessitate those being checked to 
oblige. Moreover, face recognition systems are more 
convenient to use and are more cost-effective, since 
recognition results can be corrected in uncertain cases by 
people without widespread training. Conventional optical 
imaging devices require appropriate illumination conditions 
to work properly, which is difficult to achieve satisfactorily in 
practical face recognition applications. To combat low 
illumination at nights or indoors, infrared imaging devices 
have been widely applied to much automatic face recognition 
(ARF) systems. The task of infrared-based ARF systems is to 
match a probe face image taken with the infrared imaging 
device to a gallery of face images taken with the optical 
imaging device, which is considered to be an important 
application of heterogeneous face recognition (also known as 
cross-modality face recognition). The most challenging issue 
in heterogeneous face recognition is that face images 
associated with the same perso n but taken with different 
devices may be mismatched due to the great discrepancy 
between the different image modalities (optical and infrared), 
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which is referred to as modality gap. The infrared photos are 
usually blurred, low contrast, and have significantly different 
gray distribution compared to the optical photos. The infrared 
photos are usually vague, low contrast, and have significantly 
different gray distribution compared to the optical photos. 

In common feature discriminant analysis (CFDA) 
method, a new learning-based feature descriptor is first 
developed to learn a set of optimal hyper-planes to quantize 
continuous vector space into discrete partitions for common 
feature extraction, and an effective discriminant analysis 
technique is then applied for feature classification. We 
conduct extensive experiments on two large and challenging 
optical, infrared and face sketch datasets to investigate the 
effectiveness of our new approach. It is of great interest to 
investigate whether automatic recognition of sketches using 
computers can achieve similar performance as human beings. 


II. EXISTING METHODS 

There exist a lot of methods to compare optical images 
infrared images and sketch photos. One method to convert 
an image from one modality to the other by synthesizing a 
pseudo-image from the query image such that the matching 
process can be done within the same modality. For example, 
in [1] a holistic mapping method is applied to convert a photo 
image into a corresponding sketch image, and in [2]-[4] the 
authors used local patch-based mappings to convert images 
from one modality to the other for sketch photo recognition. 
In [5] authors synthesized VIS face images from NIR face 
images with pose rectification. The second category of 
approaches is to design an appropriate representation that is 
insensitive to the modalities of images. 

For example, [6] used SIFT feature descriptors and 
multi-scale local binary patterns to represent both the sketch 
and photo images. Reference [7] proposed a learning based 
algorithm to capture discriminative local face structures and 
effectively match photo and sketch. In [8], designed a 
multi-scale common feature descriptor to combat the large 
intra-class difference incurred by the modality (VIS-NIR) 
difference. The third category of approaches is to compare 
the heterogeneous images on a common subspace where the 
modality difference is believed to be minimized [9]—[11]. For 
example, in [12] applied the Bilinear Model (BLM) by 
Singular Value Decomposition (SVD) to develop a common 
content (associated with identity) space for a set of different 
styles (corresponding to modalities). In [11], they used the 
Canonical Correlation Analysis (CCA) technique to construct 
a common subspace where the correlations between infrared 
and optical images can be maximized. In [12], the authors 
applied the CCA to cross-pose face recognition. In [13], 
Sharma applied the Partial Least Squares (PLS) method to 
derive a linear subspace in which cross-modality images are 
highly correlated, while at the same time preserving variances 
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more effectively than the previous CCA method. In [13], Lei 
and friends proposed an effective subspace learning 
framework called coupled discriminant analysis for 
heterogeneous face recognition. In [14], a generic HFR 
framework was proposed in which both probe and gallery 
images are represented in terms of non-linear kernel 
similarities to a collection of prototype face images to 
enhance heterogeneous face recognition accuracy. 


III. PROPOSED METHOD 

In this new method, we are dealing with three different 
modalities infrared face image, optical face image and sketch 
image. Here we proposed a new approach called Common 
Feature Discriminant Analysis. A new learning based face 
descriptor is first developed where the vectors of continuous 
space is converted to discrete code representation in order to 
convert the image into an encoded image. Vectors of 
continuous space is converted to the decimal code with the 
help of pixel normalization techniques like K-min and 
Random Forest Algorithms where, center pixel value is 
normalized with respect the neighboring pixels. 

In this method the following steps are performed to 
obtain the result. 

• Our input image is converted into its corresponding 

grey image and binary image 

• Preprocessing step is performed to obtain the cropped 

image ie. Hair portion is removed and the correct 

face portion is obtained 

• Vector quantization is performed and encoded image 

is obtained. 

• Encoded image is divided into a set of non 

overlapping patches with size k x k 

• Then computed the histogram over each patch 

• Concatenate the outputs of each patch into a long 

vector to form the final face feature 

• The matching frame work is conducted. Principle 

component analysis is performed. 

• Class scatter matrix is also calculated. Then we will 

get the face image and the matched sketch photo 

Vector quantization is an effective technique in 
mapping vectors of continuous space into discrete code 
representations, and has been widely used to create discrete 
image representations for object recognition. An image can 
be turned into an encoded image by converting each pixel 
into a specific code using the vector quantization technique 

Here, we designed a hyper plane-based encoding 
method for effective feature representation for heterogeneous 
face images. In feature extraction stage we use our CFDA 
approach for image encoding purpose. For image encoding 
purpose the image has to go through pipeline for feature 
extraction. For each pixel, we first sample its five d-neighbor 
(Radii = d) pixels for each direction and then subtract the 
center pixel value. Finally the centered vector is normalized 
into the unit L2-norm to form the associated pixel vector of 
that direction. Each pixel is associated with four vectors, 
forming four sets of training vectors that are used to train four 
encoders. Each encoder consists of two sets of mutually 
orthogonal hyper-planes which divide the vector space into 
four partitions. Vectors of each direction are encoded into a 


2-bit value, according to the Partition in which the vector lies 
(i.e. 00 for the first partition, 01 for the second partition, and 
so forth). Finally, the four 2- bit values are concatenated to 
form an 8-bit value that will be converted into a decimal value 
(from 0 to 255) as the code. 

With the face image encoded the image we can use 
densely sampling technique in order to extract the features. 
For this the whole encoded image is divided into a set of 
overlapping patches with the size c x c pixels (the step 
between adjacent patches). Then compute the histogram over 
each patch of the frequency of each code occurring which 
gives a feature vector for each patch. Concatenate the outputs 
of each patch into a long vector to form the final face feature. 
The matching Framework involves two levels of subspace 
analysis. In the first level, the large feature vector is first 
divided into multiple segments of smaller feature vectors. 
Discriminant analysis is performed separately on each 
segment to extract the discriminant features. The goal for the 
first level is to generate more discriminative projections to 
reduce intraclass variations and avoid over-fitting. In the 
second level, projected features from all the segments are 
then combined, with PCA for efficient recognition. 

The CFDA approach is proposed specifically for 
handling the optical-infrared face recognition problem. In the 
feature extraction stage, a learning-based feature descriptor is 
developed to maximize the correlations between the optical 
face images and corresponding infrared images. In this way, 
the modality gap between the two kinds of face images can be 
significantly reduced; hence, it is expected that the resulting 
features will be well-suited to the optical-infrared sketch face 
recognition problem. Our new feature descriptor differs 
significantly from State-of-the-art descriptors, such as the 
widely used HOG and LBP in the literature. Instead of 
encoding the images using a hand crafted encoding scheme, 
our feature descriptor learns a new encoding scheme to 
encode the common micro-structure of both the optical and 
infrared face images for effective feature representation. Our 
experimental results also support the effectiveness of our new 
descriptor over state-of-the-art descriptors. Our new feature 
descriptor also differs significantly from the CITE (coupled 
information tree encoding) in [8]. The major difference 
between them is s summarized as follows. (1) CITE is 
inherently a tree based encoding method, while our feature 
descriptor is inherently a binary encoding scheme. (2) Unlike 
CITE, we encode a pixel 

with four directions to make full use of the geometry 
information. This also reflects the significant difference 
between them. 

IV. CONCLUSION 

In this proposed paper we introduced a new approach called 
common feature discriminant analysis (CFDA), for matching 
infrared face images , optical face images and sketches. In 
CFDA, we will first develop a new descriptor to effectively 
represent optical, infrared face images and sketches to reduce 
the modality gap, and then a two-level matching method will 
be subsequently applied for fast and effective matching as a 
part of our proposed system. Extensive experiments on two 
large and challenging optical-infrared face datasets will be 
used to find the significant improvement of our new approach 
over the state-of-the-art. 
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